Goto

Collaborating Authors

 inventory level


A Computational Method for Solving the Stochastic Joint Replenishment Problem in High Dimensions

Ata, Barış, van Eekelen, Wouter, Zhong, Yuan

arXiv.org Artificial Intelligence

We consider a discrete-time formulation for a class of high-dimensional stochastic joint replenishment problems. First, we approximate the problem by a continuous-time impulse control problem. Exploiting connections among the impulse control problem, backward stochastic differential equations (BSDEs) with jumps, and the stochastic target problem, we develop a novel, simulation-based computational method that relies on deep neural networks to solve the impulse control problem. Based on that solution, we propose an implementable inventory control policy for the original (discrete-time) stochastic joint replenishment problem, and test it against the best available benchmarks in a series of test problems. For the problems studied thus far, our method matches or beats the best benchmark we could find, and it is computationally feasible up to at least 50 dimensions -- that is, 50 stock-keeping units (SKUs).


AIM-Bench: Evaluating Decision-making Biases of Agentic LLM as Inventory Manager

Zhao, Xuhua, Xie, Yuxuan, Chen, Caihua, Sun, Yuxiang

arXiv.org Artificial Intelligence

Recent advances in mathematical reasoning and the long-term planning capabilities of large language models (LLMs) have precipitated the development of agents, which are being increasingly leveraged in business operations processes. Decision models to optimize inventory levels are one of the core elements of operations management. However, the capabilities of the LLM agent in making inventory decisions in uncertain contexts, as well as the decision-making biases (e.g. framing effect, etc.) of the agent, remain largely unexplored. This prompts concerns regarding the capacity of LLM agents to effectively address real-world problems, as well as the potential implications of biases that may be present. To address this gap, we introduce AIM-Bench, a novel benchmark designed to assess the decision-making behaviour of LLM agents in uncertain supply chain management scenarios through a diverse series of inventory replenishment experiments. Our results reveal that different LLMs typically exhibit varying degrees of decision bias that are similar to those observed in human beings. In addition, we explored strategies to mitigate the pull-to-centre effect and the bullwhip effect, namely cognitive reflection and implementation of information sharing. These findings underscore the need for careful consideration of the potential biases in deploying LLMs in Inventory decision-making scenarios. We hope that these insights will pave the way for mitigating human decision bias and developing human-centred decision support systems for supply chains.


Structure-Informed Deep Reinforcement Learning for Inventory Management

Maggiar, Alvaro, Andaz, Sohrab, Bagaria, Akhil, Eisenach, Carson, Foster, Dean, Gottesman, Omer, Perrault-Joncas, Dominique

arXiv.org Artificial Intelligence

This paper investigates the application of Deep Reinforcement Learning (DRL) to classical inventory management problems, with a focus on practical implementation considerations. We apply a DRL algorithm based on DirectBackprop to several fundamental inventory management scenarios including multi-period systems with lost sales (with and without lead times), perishable inventory management, dual sourcing, and joint inventory procurement and removal. The DRL approach learns policies across products using only historical information that would be available in practice, avoiding unrealistic assumptions about demand distributions or access to distribution parameters. We demonstrate that our generic DRL implementation performs competitively against or outperforms established benchmarks and heuristics across these diverse settings, while requiring minimal parameter tuning. Through examination of the learned policies, we show that the DRL approach naturally captures many known structural properties of optimal policies derived from traditional operations research methods. To further improve policy performance and interpretability, we propose a Structure-Informed Policy Network technique that explicitly incorporates analytically-derived characteristics of optimal policies into the learning process. This approach can help interpretability and add robustness to the policy in out-of-sample performance, as we demonstrate in an example with realistic demand data. Finally, we provide an illustrative application of DRL in a non-stationary setting. Our work bridges the gap between data-driven learning and analytical insights in inventory management while maintaining practical applicability.


Market Making Strategies with Reinforcement Learning

Vicente, Óscar Fernández

arXiv.org Artificial Intelligence

This thesis presents the results of a comprehensive research project focused on applying Reinforcement Learning (RL) to the problem of market making in financial markets. Market makers (MMs) play a fundamental role in providing liquidity, yet face significant challenges arising from inventory risk, competition, and non-stationary market dynamics. This research explores how RL, particularly Deep Reinforcement Learning (DRL), can be employed to develop autonomous, adaptive, and profitable market making strategies. The study begins by formulating the MM task as a reinforcement learning problem, designing agents capable of operating in both single-agent and multi-agent settings within a simulated financial environment. It then addresses the complex issue of inventory management using two complementary approaches: reward engineering and Multi-Objective Reinforcement Learning (MORL). While the former uses dynamic reward shaping to guide behavior, the latter leverages Pareto front optimization to explicitly balance competing objectives. To address the problem of non-stationarity, the research introduces POW-dTS, a novel policy weighting algorithm based on Discounted Thompson Sampling. This method allows agents to dynamically select and combine pretrained policies, enabling continual adaptation to shifting market conditions. The experimental results demonstrate that the proposed RL-based approaches significantly outperform traditional and baseline algorithmic strategies across various performance metrics. Overall, this research thesis contributes new methodologies and insights for the design of robust, efficient, and adaptive market making agents, reinforcing the potential of RL to transform algorithmic trading in complex financial systems.


Coordinated Communication and Inventory Optimization in Multi-Retailer Supply Chains

Sudhakara, Sagar, Zhang, Yuchong

arXiv.org Artificial Intelligence

We consider a multi-retailer supply chain where each retailer can dynamically choose when to share information (e.g., local inventory levels or demand observations) with other retailers, incurring a communication cost for each sharing event. This flexible information exchange mechanism contrasts with fixed protocols such as always sharing or never sharing. We formulate a joint optimization of inventory control and communication strategies, aiming to balance the trade-off between communication overhead and operational performance (service levels, holding, and stockout costs). We adopt a common information framework and derive a centralized Partially Observable Markov Decision Process (POMDP) model for a supply chain coordinator. Solving this coordinator's POMDP via dynamic programming characterizes the structure of optimal policies, determining when retailers should communicate and how they should adjust orders based on available information. We show that, in this setting, retailers can often act optimally by sharing only limited summaries of their private data, reducing communication frequency without compromising performance. We also incorporate practical constraints on communication frequency and propose an approximate point-based POMDP solution method (PBVI/SARSOP) to address computational complexity. Numerical experiments on multi-retailer inventory scenarios demonstrate that our approach significantly improves the cost-service trade-off compared to static information sharing policies, effectively optimizing the schedule of information exchange for cooperative inventory control.


Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice

Kumar, Akshit, Peng, Tianyi, Wu, Yuhang, Zeevi, Assaf

arXiv.org Artificial Intelligence

Large language models (LLMs) have exhibited expert-level capabilities across various domains. However, their abilities to solve problems in Operations Research (OR) -- the analysis and optimization of mathematical models derived from real-world problems or their verbal descriptions -- remain underexplored. In this work, we take a first step toward evaluating LLMs' abilities to solve stochastic modeling problems, a core class of OR problems characterized by uncertainty and typically involving tools from probability, statistics, and stochastic processes. We manually procure a representative set of graduate-level homework and doctoral qualification-exam problems and test LLMs' abilities to solve them. We further leverage SimOpt, an open-source library of simulation-optimization problems and solvers, to investigate LLMs' abilities to make real-world decisions under uncertainty. Our results show that, though a nontrivial amount of work is still needed to reliably automate the stochastic modeling pipeline in reality, state-of-the-art LLMs demonstrate proficiency on par with human experts in both classroom and practical settings. These findings highlight the potential of building AI agents that assist OR researchers and amplify the real-world impact of OR through automation.


A Study of Data-driven Methods for Inventory Optimization

Ping, Lee Yeung, Wong, Patrick, Han, Tan Cheng

arXiv.org Artificial Intelligence

This paper shows a comprehensive analysis of three algorithms (Time Series, Random Forest (RF) and Deep Reinforcement Learning) into three inventory models (the Lost Sales, Dual-Sourcing and Multi-Echelon Inventory Model). These methodologies are applied in the supermarket context. The main purpose is to analyse efficient methods for the data-driven. Their possibility, potential and current challenges are taken into consideration in this report. By comparing the results in each model, the effectiveness of each algorithm is evaluated based on several key performance indicators, including forecast accuracy, adaptability to market changes, and overall impact on inventory costs and customer satisfaction levels. The data visualization tools and statistical metrics are the indicators for the comparisons and show some obvious trends and patterns that can guide decision-making in inventory management. These tools enable managers to not only track the performance of different algorithms in real-time but also to drill down into specific data points to understand the underlying causes of inventory fluctuations. This level of detail is crucial for pinpointing inefficiencies and areas for improvement within the supply chain.


A Minimax-MDP Framework with Future-imposed Conditions for Learning-augmented Problems

Chen, Xin, Chen, Yuze, Zhou, Yuan

arXiv.org Artificial Intelligence

We study a class of sequential decision-making problems with augmented predictions, potentially provided by a machine learning algorithm. In this setting, the decision-maker receives prediction intervals for unknown parameters that become progressively refined over time, and seeks decisions that are competitive with the hindsight optimal under all possible realizations of both parameters and predictions. We propose a minimax Markov Decision Process (minimax-MDP) framework, where the system state consists of an adversarially evolving environment state and an internal state controlled by the decision-maker. We introduce a set of future-imposed conditions that characterize the feasibility of minimax-MDPs and enable the design of efficient, often closed-form, robustly competitive policies. We illustrate the framework through three applications: multi-period inventory ordering with refining demand predictions, resource allocation with uncertain utility functions, and a multi-phase extension of the minimax-MDP applied to the inventory problem with time-varying ordering costs. Our results provide a tractable and versatile approach to robust online decision-making under predictive uncertainty.


Stochastic Optimization of Inventory at Large-scale Supply Chains

Jin, Zhaoyang Larry, Maasoumy, Mehdi, Liu, Yimin, Zheng, Zeshi, Ren, Zizhuo

arXiv.org Artificial Intelligence

Today's global supply chains face growing challenges due to rapidly changing market conditions, increased network complexity and inter-dependency, and dynamic uncertainties in supply, demand, and other factors. To combat these challenges, organizations employ Material Requirements Planning (MRP) software solutions to set inventory stock buffers - for raw materials, work-in-process goods, and finished products - to help them meet customer service levels. However, holding excess inventory further complicates operations and can lock up millions of dollars of capital that could be otherwise deployed. Furthermore, most commercially available MRP solutions fall short in considering uncertainties and do not result in optimal solutions for modern enterprises. At C3 AI, we fundamentally reformulate the inventory management problem as a constrained stochastic optimization. We then propose a simulation-optimization framework that minimizes inventory and related costs while maintaining desired service levels. The framework's goal is to find the optimal reorder parameters that minimize costs subject to a pre-defined service-level constraint and all other real-world operational constraints. These optimal reorder parameters can be fed back into an MRP system to drive optimal order placement, or used to place optimal orders directly. This approach has proven successful in reducing inventory levels by 10-35 percent, resulting in hundreds of millions of dollars of economic benefit for major enterprises at a global scale.


Dynamic Pricing with Adversarially-Censored Demands

Xu, Jianyu, Wang, Yining, Chen, Xi, Wang, Yu-Xiang

arXiv.org Machine Learning

We study an online dynamic pricing problem where the potential demand at each time period $t=1,2,\ldots, T$ is stochastic and dependent on the price. However, a perishable inventory is imposed at the beginning of each time $t$, censoring the potential demand if it exceeds the inventory level. To address this problem, we introduce a pricing algorithm based on the optimistic estimates of derivatives. We show that our algorithm achieves $\tilde{O}(\sqrt{T})$ optimal regret even with adversarial inventory series. Our findings advance the state-of-the-art in online decision-making problems with censored feedback, offering a theoretically optimal solution against adversarial observations.